home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
TPUG - Toronto PET Users Group
/
TPUG Users Group CD
/
TPUG Users Group CD.iso
/
PET
/
S-Super PET
/
(s)t7.d64
/
WS.DESCRIPTION
< prev
Wrap
Text File
|
2009-01-18
|
20KB
|
316 lines
ASP : A STATISTICAL PACKAGE
===========================
The programs on this disk comprise a statistical analysis package. Complete
documentation, sample problems etc. can be found in 'Computing in Statistical
Science Through APL' by Francis Anscombe, published by Springer-Verlag,
New York Inc. 1981.
Two conventions should be noted:
1) Abcissas are always mentioned before ordinates (as with arguments of
'regrinit' and 'scatterplot').
2) When numerical vectors are stacked in a matrix, they are the columns
(as with the result of 'jacobi' and the argument of 'downplot'.
* * * * *
Let us look at the workspaces and ther contents.
UTILITY:
enter: Enter data for storage in an array called matrix.
store: Save data to disk.
print: Print output to ieee4 printer.
test: Test carriage control.
read: Get file from disk.
listfile: List a disk file.
MULT/REGR: Multiple regression by stages and examination of residuals.
x regrinit y: Is used once at the outset to setup global variables for 'regr'.
the first argument X is a matrix whose columns list values of the independ-
ent variables. Usually one column is all 1's and its index number is the
argument in the first call of 'regr'. Each column of X should have been
multiplied by powers of 10 so that the unit place is the last significant
one. The second argument Y is either a vector listing values of one
dependent variable or a matrix whose columns list values of several
dependent variables. Just as with X, the unit place should be the last
significant one.
regr l: performs regression on one or more designated independent variables.
The argument L is a scalar or vector listing the index no.(s) of the
independent variables(s) to be brought into the next regression.
show v: may be used at any stage to obtain summary information about a vector.
The argument V is a vector, such as RY (if a vector) or a column of RY
(if a matrix) or a column of RX or DIAGQ. The frequency distribution in the
output is over six intervals of equal length, The 1st and 6th are centered
on the least and greatest values occurring in V. 'show' refers to no global
variables, and may be used outside this regression context.
stres: gives standardized residuals of the dependent variable(s) for use in
scatterplots. No arguments. Should be used only after 'regr' has been
executed.
variance: yields the conventional estimated variance matrix of the regression
coefficients. No arguments, use only after 'regr' has been executed.
n sample cp: sample of size N from a distribution over non-negative integers
having cumulative probabilities CP
APLTESTS: Tests on residuals after least squares linear regression.
test i: Tests for distribution shape, heteroscendasticity, nonadditivity, and
serial correlation, are carried out on residuals from a fitted regression
relation. Several Global variables from 'regr' are needed. Special cases
of the tests appear in 'rowcol' and 'summarize'. The argument I is either
1 or 2, controlling which subsidiary function is used for computing moments
of the test Statistics. (*This routine appears to be missing from the disk)
tests1: Calculates exact 2nd and third moments.
tests2: Calculates exact 2nd and approximate third moments.
ccd s: Carries out a complete cholesky decomposition.
REG/PLOT
x cor y: Calculate correlation coefficient between vectors X and Y.
stdize x: The vector X is rescaled to have zero mean and unit variance.
rlogistic n: N random logistic deviates, mean 0, variance 1.
rnormal n: N random normal deviates by box-muller method.
p quantiles v: Quantiles of the vector V for given proportions P
x fit y: Causes information to be displayed about the means of the variable and
the regression coefficient, together with a conventional estimated standard
error for the latter, calculated as though the errors were independent.
nif x: Gives the hastings approximation to the normal integral function.
X may be any numerical array.
u scatterplot v: Used for displaying corresponding members of two vectors.
U and V are vectors of equal length. Corresponding members U[j] and V[j]
are ploted as abscissa and ordinate of a a point.
u tscp v: A tripple scatterplot in which a third dimension is suggested by
varying the symbols used in plotting the points.
downplot v: Is used for plotting members of one or more vectors against their
index numbers. The argument V is either a vector or a matrix, if V is a
vector, V[j] is plotted against j. Otherwise, V[j:] is plotted against j.
z tdp v: a tripple downplot in which the third dimension is suggested by
varying the symbols used in plotting the points. The Z argument is a
character scaler or vector or matrix, indicating the symbols to be used in
plotting the points. if z is scalar, the same symbol is used every time.
TABLES
summarize y: Provides summary statistics of a data set. The argument Y may
be any numerical array having at least 4 members. Measures of location,
scale, shape of distribution are displayed.
rowcol y: performs a standard analysis of variance on row-column cross-
classification, along with an additive analysis of a two-way table, with
tests on residuals.
rowcolpermute: permutes global variables RE CE and RY, use after 'rowcol' and
before 'rowcoldisplay'. Puts CE in ascending and RE in descending order.
rowcoldisplay i: Is a special function substituting for 'tscp', that may be
used to display the output of 'rowcol'. Abcissas and ordinates are the
column effects and the row effects. The argument I is the change in column
effect represented by a unit horizontal displacement.
analyze y: Begins the analysis of variance of a perfect rectangular array.
effect v: Estimates a designated main effect or a designated interaction.
The argument V is a list of one or more coordinate numbers, just one for a
main effect, two or more for an interaction.
mp x: This routine, median polish, fits an additive structure to a two-way
table by repeatably subtracting medians of rows and medians of columns.
n bartlett s: Bartlett's test for homogeneity of variances. The second
argument is a vector of unbiased variance estimates. The first argument
is the degrees of freedom, either the common value for all the variance
estimates or a vector of values, one for each variance estimate. Box's
approximation by the f distribution and Bartlett's original chi-square
approximation.
CONTINGENCY: Analysis of contingency tables.
contingency x: Applies to two dimensional contingency tables and performs a
chi-square test of association, with display of standard residuals. The
argument X must be a matrix of non-negative numbers, having no zero marginal
totals.
fourfold x: Also applies to two dimensional tables and is applied when the
categories of each classification are ordered. Empirical log crossproduct
(fourfold) ratios are displayed. A Plackett distribution is fitted and
goodness of fit is tested by chi-squared, with display of standardized
residuals. The argument X is the same as in 'contingency'.
multipoly x: Applies to contingency tables in any number of dimensions finding
empirical log crossproduct ratios analagous to those of 'fourfold' for two
dimensions. The argument X must be an array of non-negative integers in 2
or more dimensions.
v pool x: Also applies to multi-dimensional arrays, pooling categories in any
table. The second argument X is an array in 2 or more dimensions, typically
a contingency table or a table of expected frequencies. The first argument
V is a vector with at least 3 elements. V[1] specifies the coordinate and
1(down arrow)V specifies the index values, over which there is to be
pooling. Sections of X corresponding to index values 2(down arrow)X are
added to the section with index value v[2], and then the former sections
are deleted. Thus if X is a matrix with 5 rows and V is 1 4 5 1 , pooling
will be over the 1st coordinate (rows); The contents of rows 4 5 and 1 will
be added, placed in row 4 and the rows 5 and 1 will be dropped, so the
result Y has three rows.
lfact x: Calculates the factorials of X.
n csif x: The tabulated chi-squared integral function.
inif p: Odeh-evans approximation to the inverse of the normal integral function
ctg2 x: A function similar to 'contingency' except that the probability-of-the-
sample statistic and also the likelihood-ratio statistic are calculated
instead of Pearson's chi squared. Calls 'lfact', 'csif' and 'inif'.
FUNCTIONS
x max y: Maximization of a function of one variable. The arguments are vectors
of length 3, the first having no two members equal. The explicit result is
the coordinates of the vertex of a parabola with vertical axis, that goes
through the three points whose abscissas are the first argument and
ordinates the second argument.
h integrate a: Integration of one dimensional definite integrals. The first
argument H, a scalar, is step size. The second argument A is a vector
of 2 or limits of integration, in ascending order, with differences all
divisible by H. The explicit result Z is a vector of length 1 less than
the length of A, listing the definite integrals from A[1] to each of the
other members of A. The function to be integrated is asked for, and must
be expressed in terms of an argument X(local variable). For example, to
integrate 'sin x' from 0 to each of 1 2 3 and compare the result with
'1 - cos x': 0.1 integrate -1+1 2 3 4 then enter 1ox and 1-2o1 2 3
inif p: See 'CONTINGENCY'
nif x: See 'REG/PLOT'
TIME:
cw r: Calculates weights for a cosine-weighted moving average of length R.
m filter x: For filtering time series. The first argument M is a pair of
integers defining the filter. The second argument X is a vector of data to
be filtered. Filtering consists of subtracting a cosine-weighted moving
average of extent M[2] from a similar moving average of extent M[1]. The
resulting weights are displayed, together with their sum of squares. The
elements of M must be either both odd or both even, or else one of them
must be 0.
w mav x: Moving average or filtering of a series X with weights W. A moving
average with arbitrary weights is taken of either one vector or several
vectors simultaneously. The first argument W is the vector of weights.
The second argument X is either a vector of data or a matrix whose columns
are the vectors of data to be averaged. The result U is either a vector or a
matrix.
k taper u: Fourier analysis of time series. The first argument K is a positive
integer and the scond U is a vector of length greater tha 2xK.
d fft v: Fast Fourier transform. The first argument D is scalar, either 0 1 or
2. The second argument V is a three dimensional array. If D is 0, the
function yields the complex Fourier transform of a single complex time
series. if D is 1, the function yields the real Fourier transform of a
single real series; the transform is scaled to give a direct analysis of
variance. When D is 2 the function yields simultaneously the real trans-
forms of two real series, each scaled as when D is 1.
polar s: Is used to transform the output of 'fft' to polar form. The argument
S is a matrix with 2 columns, such as the result of 'fft' when D is 1.
w ma x: Moving average or filtering of a series X with weights W.
*** The next five routines carry out a harmonic regression of a time series
on one or more other time series.
prehar: Generates phase-difference plots between pairs of series. The data is
passed to 'prehar' in the global variable FT, a three dimensional array.
harinit u: Initializes for the remaining functions.
b har1 v: Performs harmonic regression of a dependent series on one predictor
series.
b har1r v: Generates the residual series after execution of 'har1'.
b har2 v: Performs harmonic regression of a dependent series on two predictor
series.
tsnt u: Carries out a normality test on a time series. The argument is a vector
of innovations.
mardia x: Mardia's multivariate kurtosis test.
end : Invoked by 'tsnt' and 'mardia'
GAIN
f gain w: Gain of a linear filter. If the weights are symmetric, the lag is
taken to be constant, equal to 0.5x-1+pw, and the gain may be negative.
if the weights are unsymetric the lag is not computed and g is merely
the magnitude of the gain.
f gainlag w: Is used to calculate lag and gain when W is not symmetric. F is
supposed to be in increasing order, and the gain should not vanish at any
member of F; 'gain' may be run to verify this. Consecutive members of F
should be close enough together for phase changes to be small.
n autocov v: Calculates the first N serial correlations of V.
x fit y: See REG/PLOT
PLACKETT
p1 pd2 p2: The explicit result is a two dimensional plackett distribution
having discrete marginal distributions with probabilities listed in the
arguments. Each argument must consist of at least two positive integers
summing to 1. One global variable must be defined before execution, the
natural logarithm of the fourfold crossproduct ratio of probabilities.
pd3: Requires 7 global variables to be defined before execution: P1, P2 and
P3 list probabilities in the discrete univariate marginal distributions;
L12, L13 and L23 give the log fourfold ratios in the bivariate marginal
distributions; TOL is a positive tolerance such as 0.001.
x biv y: Invoked by 'pd2' and 'pd3'.
a pp b: Finds the product of two polynomials.
collect a: Used with 'pp'; collects like terms in a polynomial.
n isotropy l: Test of sphericity of a multivariate normal distribution. The
first argument N is the number of degrees of freedom in the variance
matrix. The second argument L is a vector of positive roots of the
variance matrix.
c jacobi x: Characteristic roots and vectors of a symetric matrix. The second
argument X is the given matrix, to be transformed towards diagonal form by
Jacobi's method. The first argument C, a positive scalar, is the tolerance
for off-diagonal elements in the transformed matrix.
REG7: Regression when errors have a type 7 (or 2) distribution.
t7init: Initializes.
t7lf k: Evaluates the marginal likelihood function L at a trial set of values
of the regression parameters, after integration of the likelihood with
respect to the scale parameter and the shape parameter. The argument K
specifies the order of derivatives needed, 0 for L, 1 for L and DL, 2 for
L, DL,and D2L. The argument should be 2 at first execution.
t7s q: Estimates the change in the regression parameters needed to reach the
maximum of the marginal likelihood. The argument Q should be not less
than 3/n.
t7a: Invoked by 't7lf'.
t7b: Invoked by 't7lf'.
HOUSEHOLDER: Regression by Householder tranformations, uncorrelated residuals.
x hht y: Arguments as per 'regrinit', except that the last significant digit in
all columns of X that have observational error should be in the same place,
but not necessarily in the unit place.
HUBER: Robust regression
r huber z: The function 'huber' performs one cycle of iteration towards
minimizing the sum of a function rho of the residuals in a regression
problem, where rho is defined in terms of a positive constant K.
The first argument R must be scalar and either -1 or between 0 and 1.
The second argument Z is the array of residuals corresponding to a trial
setting of the regression parameter 'beta', which must be pre-specified.
huber1: Is invoked when Z is a vector. There must be a global matrix X whose
columns are the independent variables in a regression.
huber2: Is invoked when z is a matrix. Z must have at least 3 rows and 3
columns. The usual additive structure is fitted.
ASPDATA: Contains various test files used in the book.
print: Prints a variable: i.e. print varname.
enter: As in UTILITY.
**** list of variables in 'aspdata'
loblollydata: Average heights of loblolly pines in feet.
enrol: Enrolment at yale university 1796-1975.
imports: Imports of merchandise 1790-1975.
year: a list of year numbers 1796 to 1975.
butter: Wholesale price of butter at New York (cents/pound) 1830-1975.
housing: Total nember of new housing units started (thousands) 1889-1975.
se70: School expenditures , 51 points of data, one for each state.
pi68: Personal incomes, 51 points of data, one for each state.
y69: Young persons, same as 'se70'.
urban70: Proportion urban population, same as 'se70'.
m1: White-Eisenberg table, stomach cancer site and blood group.
m2: Graunt table, sex of christenings in london and country.
m3: francis table, summary of 1954 poliomyelitis vaccine trial.
m4: Kiser-Whelpton table, education of wife and fertility planning.
m5: Gilby table, clothing and intelligence rating of schoolboys.
m6: Stuart table, men's distance vision in right and left eyes.
m7: Glass table, social mobility, father's status and son's status.
y349: A matrix of values on page 349.
x350: A matrix of values on page 350.
num: scalar=80; rec: scalar=81; vars: vector=80 1; vector: scalar=60
**************************** GENERAL COMMENTS ***********************
The Functions described above work together using many global variables.
However, it appears that they can be organized into three groupings; namely
1) analyze cor downplot effect fit quantiles
regr regrinit rowcol rowcoldisplay rowcolpermute
scatterplot show stdize stres summarize variance
2) autocov contingency cw fft filter fourfold
harinit har1 har1r har2 huber huber1
huber2 ma mav multipoly polar pool
prehar taper tdp tscp
3) bartlett biv ccd collect csif ctg2
end gain gainlag hht inif integrate
isotropy jacobi lfact mardia max mp
nif pd2 pd3 pp rlogic rnormal
sample tests tests1 tests2 tsnt t7a
t7b t7init t7lf t7s
as the author stored them in three seperate workspaces on his system.
* * * * *
print/basic: This is a Waterloo basic program to print an apl data file.
* * * * *
DISCLAIMER:
This Waterloo text file has been added to this disk for your use and
as encouragement to try the statistical functions.
I am not a statistician and it is quite possible I have misinterpreted
the meaning or significance of some function or variable.
Bill Dutfield
November 11/83
* * * * *
A THANK YOU!
I did not enter the apl functions and I would like to thank that person
(unknown to me) who did. The effort he put into entering and checking these
functions was time consuming, but I believe worthwhile not only for himself
but for us. I would like to thank Roger Green for bringing them to the
attention of the club and for making them available to us, the Superpet
group of TPUG (Toronto Pet Users Group).
***** End of File *****